21:49
2026-06-13
autodidacts.io
large-language-models
How to fit Qwen 3.6 35B A3B into 16GB of VRAM, & run it with Llama.cpp on an RTX 3080
A guide explains how to run the Qwen 3.6 35B A3B model on an RTX 3080 with 16GB VRAM using Llama.cpp, offloading most layers to CPU to fit within memory constraints. The author details steps for instaβ¦